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Abstract. In many networks, it is of great interest to identify communities, unusually densely knit groups of 
individuals. Such communities often shed light on the function of the networks or underlying properties of 
the individuals. Recently, Newman suggested modularity as a natural measure of the quality of a network 
partitioning into communities. Since then, various algorithms have been proposed for (approximately) 
maximizing the modularity of the partitioning determined. In this paper, we introduce the technique 
of rounding mathematical programs to the problem of modularity maximization, presenting two novel 
algorithms. More specifically, the algorithms round solutions to linear and vector programs. Importantly, 
the linear programing algorithm comes with an a posteriori approximation guarantee: by comparing the 
solution quality to the fractional solution of the linear program, a bound on the available "room for 
improvement" can be obtained. The vector programming algorithm provides a similar bound for the best 
partition into two communities. We evaluate both algorithms using experiments on several standard test 
cases for network partitioning algorithms, and find that they perform comparably or better than past 
algorithms. 



1 INTRODUCTION 

Many naturally occurring systems of interacting entities 
can be conveniently described using the notion of net- 
works. Networks (or graphs) consist of nodes (or vertices) 
and edges between them [39]. For example, social networks 
[42,44] describe individuals and their interactions, such as 
friendships, work relationships, sexual contacts, etc. Hy- 
perlinked text, such as the World Wide Web, consists of 
pages and their linking patterns [29] . Metabolic networks 
model enzymes and metabolites with their reactions [21]. 

In analyzing and understanding such networks, it is 
frequently extremely useful to identify communities, which 
are informally defined as "unusually densely connected 
sets of nodes". Among the benefits of identifying com- 
munity structure are the following: 

1. Frequently, the nodes in a densely knit community 
share a salient real-world property. For social networks, 
this could be a common interest or location; for web 
pages, a common topic or language; and for biologi- 
cal networks, a common function. Thus, by analyzing 
structural features of a network, one can infer semantic 
attributes. 

2. By identifying communities, one can study the com- 
munities individually. Different communities often ex- 
hibit significantly different properties, making a global 
analysis of the network inappropriate. Instead, a more 
detailed analysis of individual communities leads to 



more meaningful insights, for instance into the roles of 
individuals. 

Conversely, each community can be compressed into 
a single "meta-node" , permitting an analysis of the 
network at a coarser level, and a focus on higher-level 
structure. This approach can also be useful in visual- 
izing an otherwise too large or complex network. 



For a much more detailed discussion of these and other 
motivations, see for instance [37]. Due to the great impor- 
tance of identifying community structure in graphs, there 
has been a large amount of work in computer science, 
physics, economics, and sociology (for some examples, see 
[11,13,18,35,37]). At a very high level, one can identify 
two lines of work. In one line [13, 14], dense communities 
are identified one at a time, which allows vertices to be 
part of multiple communities. Depending on the context, 
this may or may not be desirable. Often, the communi- 
ties identified will correspond to some notion of "dense 
subgraphs" [4,13,14,23]. 

An alternative is to seek a partition of the graph into 
disjoint communities, i.e., into sets such that each node 
belongs to exactly one set. This approach is preferable 
when a "global view" of the network is desired, and is 
the one discussed in the present work. It is closely related 
to the problem of clustering; indeed, "graph clustering", 
"partitioning", and "community identification" are often, 
including here, used interchangeably. 
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Many approaches have been proposed for finding such 
partitions, based on spectral properties, flows, edge ag- 
glomeration, and many others (for a detailed overview and 
comparison, see [37]). The approaches differ in whether or 
not a hierarchical partition (recursively subdividing com- 
munities into sub-communities) is sought, whether the 
number of communities or their size is pre-specificd by 
the user or decided by the algorithm, as well as other pa- 
rameters. For a survey, see [35]. 

A particularly natural approach was recently proposed 
by Newman and Girvan [36, 40]. Newman [36] proposes to 
find a community partition maximizing a measure termed 
modularity. The modularity of a given clustering is the 
number of edges inside clusters (as opposed to crossing be- 
tween clusters) , minus the expected number of such edges 
if the graph were random conditioned on its degree dis- 
tribution [40] . Subsequent work by Newman et al. and 
others has shown empirically that modularity-maximizing 
clusterings often identify interesting community structure 
in real networks, and focused on different heuristics for ob- 
taining such clustering [8,9,11,36-38,40]. For a detailed 
overview and comparison of many of the proposed heuris- 
tics for modularity maximization, see [10]. 

Remark 1 It should be noted that graph communities found 
by maximizing modularity should be judged carefully. While 
modularity is one natural measure of community structure 
in networks, there is no guarantee that it captures the par- 
ticular structure relevant in a specific domain. For exam- 
ple, Fortunato and Barthelemy [15] have recently shown 
that modularity and more generally, each "quality func- 
tion" (characterizing the quality of the entire partition 
in one number) have an intrinsic resolution scale, and 
can therefore fail to detect communities smaller than that 
scale. More fundamentally, Kleinberg [28] has shown that 
no single clustering method can ever satisfy four natural 
desiderata on all instances. 

Recently, Brandes et al. [3] have shown that finding 
the clustering of maximum modularity for a given graph 
is NP-complcte. This means that efficient algorithms to 
always find an optimal clustering, in time polynomial in 
the size of the graph for all graphs, are unlikely to exist. It 
is thus desirable to develop heuristics yielding clusterings 
as close to optimal as possible. 

In this paper, we introduce the technique of solving 
and rounding fractional mathematical programs to the 
problem of community discovery, and propose two new 
algorithms for finding modularity-maximizing clusterings. 
The first algorithm is based on a linear programming (LP) 
relaxation of an integer programming (IP) formulation. 
The LP relaxation will put nodes "partially in the same 
cluster" . We use a "rounding" procedure due to Charikar 
et al. [5] for the problem of Correlation Clustering [1] . The 
idea of the algorithm is to interpret "partial membership 
of the same cluster" as a distance metric, and group to- 
gether nearby nodes. 

The second algorithm is based on a vector program- 
ming (VP) relaxation of a quadratic program (QP). It 
recursively splits one partition into two smaller partitions 



while a better modularity can be obtained. It is similar 
in spirit to an approach recently proposed by Newman 
[37,38], which repeatedly divides clusters based on the 
first eigenvector of the modularity matrix. Newman's ap- 
proach can be thought of as embedding nodes in the inter- 
val [—1, 1], and then cutting the interval in the middle. The 
VP embeds nodes on the surface of a high-dimensional 
hypcrsphere, which is then randomly cut into two halves 
containing the nodes. The approach is thus very similar 
to the algorithm for Maximum Cut due to Gocmans and 
WiUiamson [20]. 

A significant advantage of our algorithms over past 
approaches is that they come with an a posteriori error 
bound. The value obtained by the LP relaxation is an 
upper bound on the maximum achievable modularity. Al- 
though in principle, this bound could be loose, it was very 
accurate in all our test instances. By comparing the mod- 
ularity obtained by an algorithm against the LP value, we 
can estimate how close to optimal the solution is. Simi- 
larly, the value of the VP relaxation gives a bound on the 
best division of the graph into two communities. 

We evaluate our algorithms on several standard test 
cases for graph community identification. On every test 
case where an upper bound on the optimal solution could 
be determined, the solution found using both our algo- 
rithms attains at least 99% of the theoretical upper bound; 
sometimes, it is optimal. In addition, both algorithms match 
or outperform past modularity maximization algorithms 
on most test cases. Thus, our results suggest that these 
algorithms are excellent choices for finding graph commu- 
nities. 

The performance of our algorithms comes at a price of 
significantly slower running time and higher memory re- 
quirements. The bulk of both time and memory are con- 
sumed by the LP or VP solver; the rounding is compar- 
atively simple. Mostly due to the high memory require- 
ments, the LP rounding algorithm can currently only be 
used on networks of up to a few hundred nodes. The VP 
rounding algorithm has lower running time and memory 
requirements than the LP method and scales to networks 
of up to a few thousand nodes on a personal desktop com- 
puter. 

We believe that despite their lower efficiency, our algo- 
rithms provide three important contributions. First, they 
are the first algorithms with guaranteed polynomial run- 
ning time to provide a posteriori performance guarantees. 
Second, they match or outperform past algorithms for 
medium-sized networks of practical interest. And third, 
the approach proposed in our paper introduces a new 
algorithmic paradigm to the physics community. Future 
work using these techniques would have the potential to 
produce more efficient algorithms with smaller resource 
requirements. Indeed, in the past, algorithms based on 
rounding LPs were often a first step towards achieving the 
same guarantees with purely combinatorial algorithms. 
Devising such algorithms is a direction of ongoing work. 



Gaurav Agarwal, David Kempe: Modularity-Maximizing Graph Communities via Mathematical Programming 



3 



2 Preliminaries 

The network is given as an undirected graph G = {V^E). 
The adjacency matrix of G is denoted by A = {au.v)'- 
thus, 

^u,v — ^v.u — 1 if 7i and v share an edge, and du^v — 
av.u ~ otherwise. The degree of a node v is denoted by 
c?„. A clustering C = {Ci, . . . , Cfc} is a partition of V into 
disjoint sets Gi. We use 7(w) to denote the (unique) index 
of the cluster that node v belongs to. 

The modularity [40] of a clustering C is the total num- 
ber of edges inside clusters, minus the expected number 
of such edges if the graph were a uniformly random multi- 
graph subject to its degree sequence. In order to be able 
to compare the modularity for graphs of different sizes, 
it is convenient to normalize this difference by a factor 
of l/2m, so that the modularity is a number from the 
interval [—1,1]. 

If nodes u^v have degrees du,dv, then any one of the 
m edges has probability 2^ • ^ of connecting u and v 
(the factor 2 arises because either endpoint of the edge 
could be u or v). By linearity of expectation, the expected 
number of edges between u and v is then 4?^. Thus, the 
modularity of a clustering C is 

where 6 denotes the Kronecker Delta, which is 1 iff its 
arguments are identical, and otherwise. Newman [37] 

terms the matrix M with entries m„^„ := a^.v ^^^^ 

modularity matrix of G. For a more detailed discussion of 
the probabilistic interpretation of modularity and gener- 
alizations of the measure, see the recent paper by Gaertler 
et al. [16]. 

3 Algorithms 

3.1 Linear Programming based algorithm 

3.1.1 The Linear Program 

Based on Equation 1, we can phrase the modularity max- 
imization problem as an integer linear program (IP). (For 
an introduction to Linear Programming, we refer the reader 
to [7, 25]; for the technique of LP rounding, see [43].) The 
linear program has one variable a;„_„ for each pair [u, v) 
of vertices. We interpret Xu,v = to mean that u and 

V belong to the same cluster, and Xu,v = 1 that u and 

V are in different clusters. Then, the objective function 
to be maximized can be written as ^ '7t,„^„(1 — Xu,v)- 
This is a linear function, because the m„^„ are constants. 
We need to ensure that the consistent with each 
other: if u and v are in the same cluster, and v and w are 
in the same cluster, then so are u and w. This constraint 
can be written as a linear inequality Xu,w < Xu,v + Xy^w 
It is not difficult to see that the Xu,v are consistent (i.e., 
define a clustering) if and only if this inequality holds for 
all triples {u,v,w). Thus, we obtain the following integer 
linear program (IP): 



Maximize ^ ■ J2u,v ' (1 ~ 2;„,„) 
subject to Xu,w < Xu,v + Xv^w for all u, v, w (2) 
Xu,v S {0, 1} for all u, v 

Solving IPs is also NP-hard, and thus unlikely to be 
possible in polynomial time. However, by replacing the 
last constraint — that each Xu,v be an integer from {0, 1} 
— with the constraint that each real number be- 

tween and 1, we obtain a linear program (LP). LPs can 
be solved in polynomial time [25,26], and even quite effi- 
ciently in practice. (For our experiments, we use the widely 
used commercial package CPLEX.) The downside is that 
the solution, being fractional, does not correspond to a 
clustering. As a result, we have to apply a post-processing 
step, called "rounding" of the LP. 

3.1.2 The LP Rounding Algorithm 

Our LP rounding algorithm is essentially identical to one 
proposed by Charikar et al. [5] for the Correlation Clus- 
tering problem [1]. In correlation clustering, one is given 
an undirected graph G = {V, E) with each edge labeled 
either (modeling similarity between endpoints) or ' — ' 
(modeling dissimilarity) . The goal is to partition the graph 
into clusters such that few vertex pairs are classified in- 
correctly. Formally, in the MinDisagree version of the 
problem, the goal is to minimize the number of '— ' edges 
inside clusters plus the number of edges between clus- 
ters. In the MaxAgree version, which is not as relevant 
to our approach, the goal is to maximize the number of 
edges inside clusters plus the number of ' — ' edges be- 
tween clusters. Using the same 0-1 variables Xu,v as we 
did above, Charikar et al. [5] formulate MinDisagree as 
follows: 

Minimize E(u,i,)gb+ + T.(u^v)eE^ (1 " '^u,v) 
subject to Xu.w £ Xu,v + Xy^w for all u,v,w 
Xu.v G {0, 1} for all u,v, 

where and denote the sets of edges labeled 
and '— ', respectively. The objective can be rewritten as 
l-^+l~I](«,t,)e£;A*u,i)(l-a;«,t;), where Hu,v is 1 for edges 
and -1 for ' — ' edges. The objective is minimized when 
X](ti u)6-E /^"."(l ~ Xu,v) is maximized; thus, except for the 
shift by the constant MinDisagree takes on the 

same form as modularity maximization with m^.v = fJ-u,v 
The rounding algorithm proposed by Charikar et al. [5] 
comes with an a priori error guarantee that the objective 
produced is never more than 4 times the optimum. Al- 
gorithm with such guarantees are called Approximation 
Algorithms [43], and it would be desirable to design such 
algorithms for Modularity Maximization as well. Unfortu- 
nately, the shift by a constant prevents the approximation 
guarantees from [5] from carrying over to the Modular- 
ity Maximization problem. However, the analogy suggests 
that algorithms for rounding the solution to the MinDis- 
agree LP may perform well in practice for Modularity 
Maximization. 
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Our rounding algorithm, based on the one by Charikar 
et al., first solves the linear program (2) without the inte- 
grality constraints. This leads to a fractional assignment 
Xu^v for every pair of vertices. The LP constraints, applied 
to fractional values Xu,v, exactly correspond to the trian- 
gle inequality. Hence, the Xu,v form a metric, and we can 
interpret them as "distances" between the vertices. We 
use these distances to repeatedly find clusters of "nearby" 
nodes, which are then removed. The full algorithm is as 
follows: 



Algorithm 1 Modularity Maximization Rounding 
1: Let S = V. 

2: while S is not empty do 
3: Select a vertex u from S. 

4: Let r„ be the set of vertices whose distance from u is at 
most ^. 

5: if the average distance of the vertices in Tu \ {u} from 

u is less than j then 
6: Make C = T„ a cluster. 
7: else 

8: Make C = {u} a singleton cluster. 

9: end if 
10: Let S = S\C. 
11: end while 



Step 3 of the rounding algorithm is underspecified: it 
does not say which of the remaining vertices u to choose 
as a center next. We found that selecting a random center 
in each iteration, and keeping the best among 1000 inde- 
pendent executions of the entire rounding algorithm, sig- 
nificantly outperformed two natural alternatives, namely 
selecting the largest or smallest cluster. In particular, se- 
lecting the largest cluster is a significantly inferior heuris- 
tic. 

As a post-processing step to the LP rounding, we run 
a local-search algorithm proposed by Newman [37] to re- 
fine the results further. The post-processing step is briefly 
described below. 

An important benefit of the LP rounding method is 
that it provides an upper bound on the best solution. For 
the best clustering is the optimum solution to the integer 
LP (2); removing the integrality constraint can only in- 
crease the set of allowable solutions to the LP, improving 
the objective value that can be obtained. The upper bound 
enables us to lower-bound the performance of clustering 
algorithms. 

The other useful feature of our algorithm is its inherent 
capability to find different clusterings with similar modu- 
larity. The randomization naturally leads to different so- 
lutions, of which several with highest modularity values 
can be retained, to provide a more complete picture of 
possible cluster boundaries. 

3.2 Vector Program Based Algorithm 

In this section, we present a second algorithm which is 
more efficient in practice, at the cost of slightly reduced 



performance. It produces a "hierarchical clustering" , in 
the sense that the clustering is obtained by repeatedly 
finding a near-optimal division of a larger cluster. For two 
reasons, this clustering is not truly hierarchical: First, we 
do not seek to optimize a global function of the entire 
hierarchy, but rather optimize each split locally. Second, 
we again apply a local search based post-processing step 
to improve the solution, thus rearranging the clusters. De- 
spite multiple recently proposed hierarchical clustering al- 
gorithms (e.g., [18, 37, 41]), there is far from general agree- 
ment on what objective functions would capture a "good" 
hierarchical clustering. Indeed, different objective func- 
tions can lead to significantly different clusterings. While 
our clustering is not truly hierarchical, the order and po- 
sition of the splits that it produces still reveal much high- 
level information about the network and its clusters. 

As discussed above, our approach is to aim for the 
best division at each level individually, requiring a parti- 
tion into two clusters at each level. Clusters are recursively 
subdivided as long as an improvement is possible. Thus, 
a solution hinges on being able to find a good partition 
of a given graph into two communities. The LP rounding 
algorithm presented in the previous section is not appli- 
cable to this problem, as it does not permit specifying the 
number of communities. Instead, we will use a Vector Pro- 
gramming (VP) relaxation of a Quadratic Program (QP) 
to find a good partition of a graph into two communities. 

3.2.1 The Quadratic Program 

Our approach is motivated by the same observation that 
led Newman [37] to an eigenvector-based partitioning ap- 
proach. For every vertex v, we have a variable yy which 
is 1 or -1 depending on whether the vertex is in one or 
the other partition. Since each pair u, v adds ruu.v to 
the objective iff u and v arc in the same partition (and 
zero otherwise), the objective function can be written as 
4^ J2u,v "^«,t)(l + yuUv)- Newman [37] rewrites this term 
further as ■^y'^My (where y is the vector of all y^ val- 
ues), and observes that if the entries yy were not restricted 
to be ±1, then the optimal y would be the principal eigen- 
vector of M. His approach, in line with standard spectral 
partitioning approaches (e.g., [12]), is then to compute 
the principal eigenvector y, and partition the nodes into 
positive yy and negative yy. Thus, in a sense, Newman's 
approach can be considered as embedding the nodes opti- 
mally on the line, and then rounding the fractional solu- 
tion into nodes with positive and negative coordinates. 

Our solution also first embeds the nodes into a met- 
ric space, and then rounds the locations to obtain two 
communities. However, it is motivated by considering the 
objective function as a strict quadratic program (see, e.g., 
[43] ) . We can write the problem of partitioning the graph 
into two communities of maximum modularity as 

Maximize J2u,v " (1 + VuVv) /gx 
subject to yy ~ I for all v. 

Notice that the constraint y'^ = 1 ensures that each yy is 
±1 in a solution to (3). 
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Quadratic Programming, too. is NP-complete. Hence, 
we use the standard technique of relaxing the QP (3) to 
a corresponding Vector Program (VP), which in turn can 
be solved in polynomial time using semi-definite program- 
ming (SDP). To turn a strict quadratic program into a 
vector program, one replaces each variable Uv with a (n- 
dimensional) vector- valued variable , and each product 
UuUv with the inner product yu • yu- We use the standard 
process [43] for transforming the VP formulation to the 
SDP formulation and for obtaining back the solution to 
the VP from the solution to SDP. For solving the SDP 
problems in our experiments, we use a standard off-the- 
shelf solver CSDP [2]. 

The result of solving the VP will be vectors y^, for all 
vertices v, which can be interpreted as an embedding of 
the nodes on the surface of the hypersphere in n dimen- 
sions. (The constraint yv - Yv = 1 for all v ensures that all 
nodes are embedded at distance 1 from the origin.) Thus, 
the inner product of two node positions Yu-iYv 

is equal 

to the cosine of the angle between them. As a result, the 
optimal VP solution will "tend to" have node pairs with 
negative m„^„ far apart (large angles), and node pairs with 
positive TO„ .1, close (small angles). 



Their method gives an a priori error guarantee of ]?(! / log n) 
under the assumption that all diagonal entries of the ma- 
trix M are zero. In fact, if the matrix is also positive 
semi-definite, then a result of Nesterov [33] shows that 
the approximation guarantee can be improved to 2/7r. Un- 
fortunately, the modularity matrix M is neither positive 
semi-definite nor does it have an all-zero trace; hence, nei- 
ther of these approximation results is applicable to the 
problem of finding the modularity-maximizing partition 
into two communities. 

We also implemented the rounding procedure of [6], 
and tested it on the same example networks as the other 
algorithms. We found that its performance is always infe- 
rior to the hyperplane based algorithm, sometimes signifi- 
cantly so. Since the algorithm is not more efficient, either, 
we omit the results from our comparison in Section 4. 



3.2.3 The Hierarchical Clustering Algorithm 



3.2.2 Rounding the Quadratic Program 

To obtain a partition from the node locations y„, we use a 
rounding procedure proposed by Goemans and Williamson 
[20] for the Max-Cut problem. In the Max-Cut problem, 
an undirected graph is to be partitioned into two dis- 
joint node sets so as to maximize the number of edges 
crossing between them. This objective can be written as a 
quadratic program as follows (notice the similarity to the 
Modularity Maximization QP): 

Maximize \ E(„,„)6_e(1 - VuVv) 
subject to = 1 for all v. 

The rounding procedure of Goemans and Williamson 
[20], which we adopt here, chooses a random (n — 1)- 
dimensional hyperplane passing through the origin, and 
uses the hyperplane to cut the hypersphere into two halves. 
The two partitions are formed by picking the vertices ly- 
ing on each side of the hypersphere. The cutting hyper- 
plane is represented by its normal vector s, which is an 
n-dimensional vector, each of whose components is an in- 
dependent A/'(0, 1) Gaussian. (It is well known and easy 
to verify that this makes the direction of the normal uni- 
formly random.) To cut the hypersphere, we simply define 
•S* {w I yt) • s > 0} and 5* {u | y^, • s < 0}. Once the 
VP has been solved (which is the expensive part), one can 
easily choose multiple random hyperplanes, and retain the 
best resulting partition. In our experiments, wc chose the 
best of 5000 hyperplanes. 

A different approach to rounding VP solutions of the 
form (3) was recently proposed by Charikar and Wirth 
[6], again in the context of Correlation Clustering. Their 
method first projects the hypersphere on a random line, 
scales down large coordinates, and then rounds randomly. 



Note that the effect of partitioning a community C further 
into two sub-communities C ,C" is independent of the 
structure of the remaining communities, because any edge 
inside one of the other communities remains inside, and 
the expected number of edges inside other communities 
also stays the same. Thus, in splitting C into C and C", 
the modularity Q increases by 



AQ{C) ^^veC'dv){EueC"dn) _ |g(c.',c")|^ , 

m \ 2to / 



where e{C' ,C") denotes the set of edges between C and 
C". 

The target communities C", C" are calculated using 
the above VP rounding, and the algorithm will terminate 
when none of the AQ{C) arc positive. The full algorithm 
is given below. 

The use of a Max-Heap is not strictly necessary; a set 
of active communities would have been sufficient. How- 
ever, the choice of a Max-Heap has the added advantage 
that by slightly tweaking the termination condition (re- 
quiring an increase greater than some e), one can force 
the communities to be larger, and the algorithm to termi- 
nate faster. 

It is important that in each iteration of the algorithm, 
the degrees for each vertex v and the total number of 
edges m be calculated by taking into account all the edges 
in the entire graph and not just the edges belonging to the 
sub-graph being partitioned. 
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Algorithm 2 Hierarchical Clustering 

1: Let M be an empty Max-Heap. 

2: Let C be a cluster containing all the vertices. 

3: Use VP rounding to calculate (approximately) the maxi- 
mum increase in modularity possible, AQ{C), achievable 
by dividing C into two partitions. 

4: Add {C,AQ(C)) to M. 

5: while the head element in M has AQ{C) > do 
6: Let C be the head of M. 

7: Use VP rounding to split C into two partitions C',C", 

and calculate AQ{C'), AQ{C"). 
8: Remove C from M. 
9: Add (C, AQ{C')), (C", AQ{C")) to M. 
10: end while 

11: Output as the final partitioning all the partitions remain- 
ing in the heap M, as well as the hierarchy produced. 



As a post-processing step, we run the local-search al- 
gorithm proposed by Newman [37]. The post-processing 
brings the VP results nearly to par with those obtained 
by the LP method. 

3.3 Local Search Algorithm 

We use the local-search algorithm proposed by Newman 
[37] for refining the results obtained by our LP and VP 
methods. This method improved the modularity of the 
partitions produced by the LP method by less than 1% 
and in the case of the QP method, it improved the mod- 
ularity by less than 5%. The local search method is based 
on the Kernighan-Lin algorithm for graph bisection [27]. 
Starting from some initial network clustering, the modu- 
larity is iteratively improved as follows: select the vertex 
which, when moved to another group, results in the max- 
imum increase in modularity (or minimum decrease, if no 
increase is possible). In one complete iteration, each ver- 
tex changes its group exactly once; at the end of the iter- 
ation, the intermediate clustering with the highest modu- 
larity value is selected as the new clustering. This process 
is continued as long as there is an increase in the over- 
all modularity. For details of the implementation, we refer 
the reader to [37] . 



4 Examples 

In this section, we present results for both of our algo- 
rithms on several real-world networks. We focus on well- 
studied networks since our goal in this paper is to com- 
pare the quality of optimization achieved by our methods 
to approaches in past work, rather than discovering novel 
structure. We restrict our attention here to networks with 
at most a few thousand nodes, as this is currently the limit 
for our algorithms. The algorithm implementations are 
available online at http : //www-scf . use . edu/~gaurava. 

We evaluate our results in two ways: manually and 
by comparing against past work. For several smaller net- 
works, we show below the clusterings obtained by the LP 




Fig. 1. The optimal community structure with modularity 
0.4197 for Zachary's Karate Club network. Each community is 
shaded with a different color. 

rounding algorithm. In all of the cases, the clusterings can 
be seen to be closely correlated with some known "seman- 
tic" information about the network. 



4.1 Zachary's Karate Club 

The ''Zachary 's Karate Club " [45] network represents the 
friendships between 34 members of a karate club in the US 
over a period of 2 years. It has come to be a standard test 
network for clustering algorithms, partly due to the fact 
that during the observation period, the club broke up into 
two separate clubs over a conflict, and the resulting two 
new clubs can be considered a "ground truth" clustering. 
Both of our algorithms find a community structure identi- 
cal to the one detected by Medus et al. [32] . It has a modu- 
larity of 0.4197. Our algorithm also proves this value to be 
best possible, because the LP returned a {0, l}-solution, 
i.e., no rounding was necessary. The community structure 
found for the Karate Club network is shown in Figure 1. 

For finding the primary two-community division in this 
network, we ran a single iteration of the VP algorithm 
and found a partition identical to that found by Medus 
et al. [32]. This partition corresponds almost exactly to 
the actual factions in the club, with the exception of node 
10. The bipartition found by the VP method has a mod- 
ularity of 0.3718, whereas the partition corresponding to 
the actual factions in the club has a lower modularity of 
0.3715. This explains the "misclassification" of node 10, 
and also emphasizes that no clustering objective can be 
guaranteed to always recover the "semantically correct" 
community structure in a real network. The latter should 
be taken as a cautioning against accepting modularity- 
maximizing clusterings as ground truth. 

4.2 College Football 

This data set representing the schedule of Division I foot- 
ball games for the 2000 season was compiled by Girvan 
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and Newman [18]. Vertices in the graph represent teams, 
and edges represent regular season games between the two 
teams they connect. The teams are divided into confer- 
ences with 8-12 teams each. Usually, more games are 
played within conferences than across conferences, and 
it is an interesting question whether the ground truth of 
conferences can be reconstructed by observing the games 
played. Both our algorithms find the same clustering with 
modularity 0.6046, shown in Figure 2. The algorithms ac- 
curately recover most of the conferences as well as the 
independent teams (which do not belong to any confer- 
ence) . 

Our algorithms also found a slightly suboptimal clus- 
tering of modularity 0.6044, combining two prominent con- 
ferences. Mountain West and Pacific 10 (brown squares 
and gray hexagons in the top right corner) into one com- 
munity. The reason is that many games were played be- 
tween teams of the two conferences. This shows that com- 
munity detection is inherently unstable: solutions with 
only slightly different modularity (differing only by 0.0002) 
can differ significantly. Such slight differences could easily 
elude heuristic algorithms. More importantly, this insta- 
bility shows again that communities maximizing modular- 
ity should be evaluated carefully for semantic relevance. 
With respect to such instabilities in community structure, 
Gfeller et al. [17] give a more detailed analysis and provide 
methods for detecting them. However, their methods are 
applicable only to non-randomized clustering algorithms 

This example illustrates an advantage of our random- 
ized rounding algorithms, which produce multiple differ- 
ent solutions. These solutions together often reveal more 
information about community boundaries. They can also 
be manually inspected if desired, and a researcher with 
domain knowledge can pick the one representing the true 
underlying structure most accurately. 




Fig. 2. The partitioning of the College Football network found 
by the LP rounding algorithm. Each detected community is 
shaded with a different color. The actual conferences are de- 
picted using different shapes. 



4.3 Books on American Politics 

As a final example. Figure 3 shows the community struc- 
ture detected in the American Political Books network 
compiled by V. Krebs. The vertices represent books on 
American politics bought from amazon.com, and edges 
connect pairs of books frequently co-purchased. The books 
in this network were classified by Newman [38] into cat- 
egories liberal or conservative, except for a small number 
of books with no clear ideological leaning. Figure 3 shows 
that our algorithm accurately detects a strong commu- 
nity structure, which matches fairly well the underlying 
semantic division based on political slant. 




Fig. 3. The partitioning of the American Political Books net- 
work found by the LP rounding algorithm. Each detected com- 
munity is shaded with a different color, while actual political 
slants are depicted using different shapes. The circles are liberal 
books, the triangles are conservative books, and the squares are 
centrist. 

The community structure produced by our LP algo- 
rithm has a modularity of 0.5272 and agrees mostly with 
the manual labeling, ft is very similar to the one produced 
by Newman [37] , except for an extra cluster of three nodes 
produced by our method, as well as slightly fewer "mis- 
classified" nodes in the two main clusters. The three books 
in the additional cluster were biographical in nature, and 
were always bought together. The additional cluster is not 
found by the VP method, which instead merges the three 
biographical books with the blue cluster, and obtains a 
modularity of 0.5269. 

For finding the primary division in this network, we ran 
a single iteration of the VP algorithm. The partition has 
a modularity value of 0.4569. It produces a partition with 
all the liberal hooks and three of the conservative books 
assigned to one cluster and the remaining conservative 
hooks assigned to the other cluster. The centrist books were 
divided roughly evenly among the two clusters. 

We also computed the modularity values for various 
"ground truth" partitionings. If the books are divided into 
three communities corresponding to liberal, conservative, 
and centrist (according to a manual labeling), the modu- 
larity is significantly inferior to our best clustering, namely 
0.4149. If the centrist books are completely grouped with 
either the liberal or conservative books, the modularity 
deteriorates further to 0.3951 resp. 0.4088, which is notice- 
ably worse than the modularity of 0.4569 achieved by the 
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bipartition of our algorithm. This corroborates an observa- 
tion already made in discussing the Zachary Karate Club: 
the semantic ground truth partitioning will not necessar- 
ily achieve the highest modularity as a network partition, 
and hence, the two should not be treated as identical. 



4.4 Other Examples 

We tested our methods on several other networks and were 
able to identify community structures with very high mod- 
ularity values. The test networks included a collaboration 
network oi jazz musicians (JAZZ) [19], the social network 
of a community of 62 hottlenose dolphins (DOLPH) liv- 
ing in Doubtful Sound, New Zealand [31], an interaction 
network of the characters from Victor Hugo's novel Les 
Miserables (MIS) [30], a collaboration network (COLL) 
of scientists who conduct research on networks [34], a 
metabolic network for the nematode C.elegans (META) 
[24] and a network of email contacts between students 
and faculty (EMAIL) [22]. 

We compare our algorithms against past published par- 
titioning heuristics, specifically, the cdge-betweenness based 
algorithm of Girvan and Newman [18] (denoted by GN), 
the extremal optimization algorithm of Duch and Arenas 
[11] (DA) and the eigenvector based algorithm of New- 
man [37,38]. The bottom-up heuristic of Clauset, Moore, 
and Newman [9] is designed not so much to yield close- 
to-optimal clusterings as to give reasonable clusterings for 
extremely large networks (several orders of magnitude be- 
yond what our algorithms can deal with) ; the performance 
of their heuristic is significantly inferior to the other meth- 
ods. 



Network 


size n 


GN 


DA 


EIG 


VP 


LP 


UB 


KARATE 


34 


0.401 


0.419 


0.419 


0.420 


0.420 


0.420 


DOLPH 


62 


0.520 






0.526 


0.529 


0.531 


MIS 


76 


0.540 






0.560 


0.560 


0.561 


BOOKS 


105 






0.526 


0.527 


0.527 


0.528 


BALL 


115 


0.601 






0.605 


0.605 


0.606 


JAZZ 


198 


0.405 


0.445 


0.442 


0.445 


0.445 


0.446 


COLL 


235 


0.720 






0.803 


0.803 


0.805 


META 


453 


0.403 


0.434 


0.435 


0.450 






EMAIL 


1133 


0.532 


0.574 


0.572 


0.579 







Table 1. The modularity obtained by many of the previously 
published methods and by the methods introduced in this pa- 
per, along with the upper bound. 



Both the LP and VP rounding algorithms outperformed 
all other methods in terms of the value of modularity ob- 
tained. We summarize the results obtained by all algo- 
rithms as well as the upper bound (denoted by UB) in 
Table 1. (Some LP heuristic and upper bound entries for 
larger data sets are missing, because the LP solver could 
not solve such large instances.) Notice that it is not clear 
whether the upper bound can in fact be attained by any 
clustering. It is, however, striking how close to the upper 



bound the clusterings found by the LP and VP rounding 
algorithms are. 

5 Conclusion 

We have shown that the technique of rounding solutions 
to fractional mathematical programs yields high-quality 
modularity maximizing communities, while also providing 
a useful upper bound on the best possible modularity. 

The drawback of our algorithms is their resource re- 
quirement. Due to 0{n^) constraints in the LP. and 0{n'^) 
variables in the VP, the algorithms currently do not scale 
beyond about 300 resp. 4000 nodes. Thus, a central goal 
for future work would be to improve the running time 
without sacrificing solution quality. An ideal outcome would 
be a purely combinatorial algorithm avoiding the explicit 
solution to the mathematical programs, but yielding the 
same performance. 

Secondly, while our algorithms perform very well on all 
networks we considered, they do not come with a priori 
guarantee on their performance. Heuristics with such per- 
formance guarantees are called approximation algorithms 
[43], and are desirable because they give the user a hard 
guarantee on the solution quality, even for pathological 
networks. Since the algorithms of Charikar et al. and Goe- 
mans and Williamson on which our approaches are based 
do have provable approximation guarantees, one would 
hope that similar guarantees could be attained for modu- 
larity maximization. However, this does not hold for the 
particular algorithms we use, due to the shift of the ob- 
jective function by a constant. Obtaining approximation 
algorithms for modularity maximization thus remains a 
challenging direction for future work. 
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